Global Syllable Vectors for Building TTS Front-End with Deep Learning

نویسندگان

Jinfu Ni

Yoshinori Shiga

Hisashi Kawai

چکیده

Recent vector space representations of words have succeeded in capturing syntactic and semantic regularities. In the context of text-to-speech (TTS) synthesis, a front-end is a key component for extracting multi-level linguistic features from text, where syllable acts as a link between lowand high-level features. This paper describes the use of global syllable vectors as features to build a front-end, particularly evaluated in Chinese. The global syllable vectors directly capture global statistics of syllable-syllable co-occurrences in a large-scale text corpus. They are learned by a global log-bilinear regression model in an unsupervised manner, whilst the front-end is built using deep bidirectional recurrent neural networks in a supervised fashion. Experiments are conducted on large-scale Chinese speech and treebank text corpora, evaluating grapheme to phoneme (G2P) conversion, word segmentation, part of speech (POS) tagging, phrasal chunking, and pause break prediction. Results show that the proposed method is efficient for building a compact and robust front-end with high performance. The global syllable vectors can be acquired relatively cheaply from plain text resources, therefore, they are vital to develop multilingual speech synthesis, especially for under-resourced language modeling.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Syllable HMM based Mandarin TTS and comparison with concatenative TTS

This paper introduces a Syllable HMM based Mandarin TTS system. 10-state left-to-right HMMs are used to model each syllable. We leverage the corpus and the front end of a concatenative TTS system to build the Syllable HMM based TTS system. Furthermore, we utilize the unique consonant/vowel structure of Mandarin syllable to improve the voiced/unvoiced decision of HMM states. Evaluation results s...

متن کامل

Data pruning approach to unit selection for inventory generation of concatenative embeddable Chinese TTS systems

In this paper, a data pruning approach is presented for building acoustic unit inventory for syllable-based concatenative embeddable Chinese TTS system. A 3-portion segmentation of a syllable is proposed based on the nature of voiced/unvoiced structure of Chinese syllable. Individual factorial acoustic measurement of syllable is used to calculate the penalty of perceptual unsatisfactory for con...

متن کامل

Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN

This paper presents a text to speech (TTS) extension to Kaldi a liberally licensed open source speech recognition system. The system, Idlak Tangle, uses recent deep neural network (DNN) methods for modelling speech, the Idlak XML based text processing system as the front end, and a newly released open source mixed excitation MLSA vocoder included in Idlak. The system has none of the licensing r...

متن کامل

Development of Speech Database for Hindi Text-To-Speech System Considering Syllable as a Basic Unit

The objective of a Texttospeech system is to convert an orthographic text into intelligible and natural sounding speech. In order to achieve this, unit selection plays a vital role. Phoneme, diphone, allophone and syllable are the basic units of speech system. Considering phoneme as a basic unit for concatenation based TTS system results in larger concatenation points, this result in low qualit...

متن کامل

Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages

Automatic detection of phoneme boundaries is an important sub-task in building speech processing applications, especially text-to-speech synthesis (TTS) systems. The main drawback of the Gaussian mixture model hidden Markov model (GMMHMM) based forced-alignment is that the phoneme boundaries are not explicitly modeled. In an earlier work, we had proposed the use of signal processing cues in tan...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Global Syllable Vectors for Building TTS Front-End with Deep Learning

نویسندگان

چکیده

منابع مشابه

Syllable HMM based Mandarin TTS and comparison with concatenative TTS

Data pruning approach to unit selection for inventory generation of concatenative embeddable Chinese TTS systems

Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN

Development of Speech Database for Hindi Text-To-Speech System Considering Syllable as a Basic Unit

Deep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages

عنوان ژورنال:

اشتراک گذاری